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CONFIDENCE REGIONS FOR HIGH QUANTILES OF A HEAVY 
TAILED DISTRIBUTION 

By Liang Peng 1 and Yongcheng Qi 

Georgia Institute of Technology and University of Minnesota Duluth 

Estimating high quantiles plays an important role in the context 
of risk management. This involves extrapolation of an unknown distri- 
bution function. In this paper we propose three methods, namely, the 
normal approximation method, the likelihood ratio method and the 
data tilting method, to construct confidence regions for high quan- 
tiles of a heavy tailed distribution. A simulation study prefers the 
data tilting method. 

1. Introduction. In estimating high quantiles of an unknown probability 
distribution function, one has to infer beyond the observations. This can be 
done via extrapolating from intermediate quantiles when the underlying 
distribution has a regularly varying tail. An important application of high 
quantiles is to forecast rare events. Some references on this topic include 
[1, 3, 7, 14, 23, 25]. Like tail index estimation, only a part of the upper order 
statistics is involved in the estimation of high quantiles. Recently, Ferreira, 
de Haan and Peng [8] provided a data-driven method to choose the optimal 
sample fraction in terms of asymptotic mean squared errors. 

In this paper we are interested in obtaining confidence regions for high 
quantiles. More specifically, three methods will be investigated, namely, the 
normal approximation method, the likelihood ratio method and the data 
tilting method; see Section 2 for details. We demonstrate by a simulation 
study that the data tilting method is preferred. Our data tilting method is 
similar to the general data tilting method proposed by Hall and Yao [12], 
which is employed to tilt time series data. This general data tilting method 
was applied to interval estimation, robust inference and inference under 
constraints for linear time series. One of its advantages is that it admits a 
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wide range of distance functions. Tilting methods to statistics have a long 
history; nonparametric techniques involving tilting go back at least to work 
of Grenander [10], which studies nonparametric density estimation under 
monotonicity constraints. 

To our best knowledge, not much work has been done in applying data 
tilting methods or empirical likelihood methods to statistics of extremes. 
The empirical likelihood method, introduced in [17, 18], is a nonparametric 
approach for constructing confidence regions. Like the bootstrap and the 
jackknife, the empirical likelihood method does not need to specify a family 
of distributions for the data. One of the advantages of the empirical like- 
lihood method is that it enables the shape of a region, such as the degree 
of asymmetry in a confidence interval, to be determined automatically by 
the sample. In certain regular cases, empirical likelihood based confidence 
regions are Bartlett correctable; see [6, 11]. For a more complete disclosure 
of recent references and development, we refer to the book by Owen [19]. Re- 
cently, Lu and Peng [15] applied the empirical likelihood method to obtain 
confidence intervals for the tail index, and Peng [20] generalized the em- 
pirical likelihood method to the case of infinite variance. Here, we propose 
to employ the general data tilting method of Hall and Yao [12] to obtain 
confidence regions for high quantiles. 

We organize this paper as follows. In Section 2 three different methods for 
constructing confidence regions for high quantiles are introduced, and main 
results about asymptotic limits are also presented. In Section 3 simulation 
results are reported for comparisons of the performance of the three methods 
in terms of both coverage probability and approximate interval length, and 
a real data application is also included. Finally, all the proofs are given in 
the Appendix. 

2. Methodologies and main results. Let X\, . . . , X n be independent ran- 
dom variables with a common distribution function F which satisfies 

(1) 1 — F(x) = e{x)x~ 1 for x > 0, 

where 7 > is an unknown parameter called the tail index, and e(x) is 
a slowly varying function, that is, lim t - >00 e(tx)/e(t) = 1 for all x > 0. Let 
X n> i < ■ ■ ■ < X n>n denote the order statistics of Xx, . . . , X n . 

Throughout this paper we assume that p n £ (0,1) and p n — > as n — > 
00. A 100(1 — p n )% quantile for the distribution F is defined as x p = (1 — 
F)~(p n ), where (•)" denotes the inverse function of (•). The main aim of 
this paper is to obtain confidence regions for x p . 

In order to introduce our methodologies, let us assume temporarily that 
F has the simpler form 



(2) 



1 — F{x) = cx 7 for x > T. 
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Put Si = I(Xi > T). Then the likelihood function for the censored data {(Si, 
max(X 4 ,T))}" =1 is 

n 

L( 7 ,c) = n(c7^" 7_1 ) 5l (l " cT^) 1 ^. 
i=i 

In the paper we actually take T = X n>n _fc, where k = k(n) satisfies 

(3) k — > oo and > 0. 

n 

Then the likelihood function above becomes 

n 

(4) L( 7 ,c) = n(c7*r 7 ~V*(l " cX~l_ k f^, 

8=1 

Next we are ready to present our three methods for constructing confidence 
regions for x p . 

Method I: Normal approximation method. Let (7 n ,c n ) denote the max- 
imum likelihood estimator of (0,7), that is, L(7 n ,c n ) = max 7> o iC >o-£(7,c). 
Then it is easy to check that c n = \X^ n _ k and 

( k \ -1 



^ = 1 ^X^ l0 S(^,n-i+l/^n,n-fe) 



8=1 



Note that ^ n is the well-known Hill estimator [13]. Therefore, by (2), a nat- 
ural estimator for Xp IS Xp = (p n /c n ) 1 /^ n . In order to derive the asymptotic 
normality of x p , we need a stricter condition than (1): suppose there exists 
a function A(t) — ► (as t — ► 00) such that 

for some p < 0, where U(x) = (j^p)~(x). Then is a regularly vary- 

ing function with index p; see [5]. Note that (5) implies (1). The following 
theorem can be derived from [8]. 

Theorem 1. Assume (5) and (3) hold. If \JkA(njk) — > 0, np n = 0(k) 



an 



d log(^-)/v / fc — > as oo, then 
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Hence, based on the above limit, a confidence interval with level a for x r 



is 



J a = ^P eX p|- Z a lo g^^-) / (TnV 7 ^) j,X p expjz a log^-^-^ j (V^fc) j 

where z a satisfies P(|iV(0, 1)| < z a ) = a. This confidence interval has asymp- 
totically correct coverage probability a, that is, P(x p £ /") — > a as n — > oo. 
The next theorem presents the coverage expansion for I™. 

Theorem 2. Under the conditions of Theorem 1, 

P i — 7777 rrlog — <x - $(s) 

Vlog(fc/(np n )j x p / 

1 7 /— l/fc\ -2 

= — ^(x)(l + 2x 2 ) - 4>(x) — —VkA(n/k) - -x<f>{x) log 

3V k 1-/0 2 V «-fW 

uniformly for —oo < x < oo, where $(x) and <f>(x) denote the distribution 
function and density function of N(0,1), respectively. Furthermore, 

-2 



p ( x p €lZ) = a- z a (j)(z a ) (log —) 



+ o( flog—) + -^=+Vk\A(n/k)\ 
\\ np n J yjk 

Remark 1. Theorem 2 shows that P(x p G J") — a = 0((logn)~ 2 ) in the 
case log(np n ) = 0(log(n)). This means that the coverage accuracy for high 
quantiles is not very accurate in general. To achieve this asymptotic rate, k 
can be of order n e for some < < — 2p/(l — 2p). The unknown parameter 
p can be estimated; see, for example, [21]. 

Method II: Likelihood ratio method. Define 7 n and c n as in Method I. 
First set 

h= max logL(7,c) =logL(7„,c n ). 

7>0,c>0 

Next we maximize log L (7,0) subject to 

7>0, c>0, 7logx p + log^J =0, 

and denote this maximized likelihood function by hixp). Note that the above 
equation comes from setting p n = 1 — F(x p ) = cx~^ . It is easy to show that 

/ 2 (xp) = logL( 7 (A),c(A)), 
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where 

7(A) = —, 

- logX n>n _ fc ) + XlogX ntJl _ k - Alogx f 



c(X)=X, 
and A satisfies 



7(A) fc - A 
n,n-k n _ X 



(7) 7(A) log x p + logQ^)=0, 

(8) 7(A) > and A < fc. 
Therefore, the log-likelihood ratio multiplied by minus two is 

l(x p ) = -2(l 2 (x p )-h). 

Theorem 3. Suppose the conditions in Theorem 1 hold. Then there 
exists a unique solution to (7) and (8), say, X(x p ), and 

(9) Z(* p ,o)^X 2 (l), 

with A = \(x Pt o) in the definition of hixpfi), where x p $ is the true value 
of x p . 

Therefore, based on the above limit, a confidence region with level a for 

where u a is the a-level critical point of x 2 (l)- This confidence region has 
asymptotically correct coverage probability a, that is, P(x p 6 I l a ) — > a as 
n — > oo. 



Remark 2. The profile likelihood approach has been employed to con- 
struct confidence regions for high quantiles based on fitting a generalized 
Pareto distribution to exceedances over a deterministic high threshold; see 
[24]. The difference between our Method II and the profile likelihood method 
is that we take the random high threshold into account in our censored like- 
lihood function. 

Method III: Data tilting method. Here we employ a data tilting method, 
similar to that of Hall and Yao [12], to construct a confidence region for x p . 
First, for any fixed weights q = (qi, . . . , q n ) such that qi > and Ya=i 1i = lj 
we solve 

n 

(j(q),c(q)) = argmax]>>log((c7^-y*(l _ cX^) 1 "*). 

(7,c) j=i 



This results in 



7(<?) 
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E?=i qA^ogXi - \ogX n ^ n _ k ) ' 

n 

£(9)=<2-*£**- 



Define 



i=l 



(Po(l-Po)) 1 fl-n 1 £(n? i 
-n _1 ^log(n^), 

i=l 



Pn 



if po/0,1, 
if po = 0, 
if po = 1. 



7=1 



The function Z?p (g) is a measure of distance between q and uniform distri- 
bution, that is, qi = 1/n. Next, we shall choose q to minimize this distance. 
More specifically, solve (2n) _1 L(x p ) = min q D po (q) subject to the constraints 



?i>0, ^^ = 1, 7(?)log 



i=l 



n,n— fc 



log- 



The constraint 7(g) log(x p /X„ jn _ fc ) = log(£f =1 Qi^i/Pn) is equivalent to a; p = 

(pn/c(Q)r mq) - 

Here we only consider the case po = 1 since other cases are similar and 
the case po = l gives good robustness properties. Put 



.4 



1 (A 1 ) = 1-— £ e -i-*i and A 2 (Ai)=^(Ai 



log(z p /X 



n,n—k J 



re ' log(Ai(Ai)/p., 

Then, by the standard method of Lagrange multipliers, we have 



?i(Ai, A 2 ) 
1 

exp< — 1 — Ai 



(10) 



if 5i = 0, 



11 



+ A 2 



. if Si = 1, 
where Ai and A 2 satisfy 



(11) 



J2q i = l, j{q)lo£ 



\og(x p /X n ^ k ) _ _J_ 

4j(Ai) Ai(Ai) 
_ ^i(Ai) log(Xj/X rain _ fc ) log(x p /X n . in „ fc ) 

4(Ai) 

, EI=1 (Zi^i 



i=l 



X 



71,71 — fc 



log- 



Pn 
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Theorem 4. Suppose the conditions in Theorem 1 hold. Then, with 
probability tending to one, there exists a solution to (11), say, (\i(x p ),\2(x p )), 
such that, for (Ai,A2) = (Xi(x p ), \2(x p )), 

(12) ^ -* '~ ^ 

< -log 1 : 

V n — k 

(13) |A2| ^ fc " /4 ioi(|^ 

and L(x p fi) -^>x 2 (l) (Ai,A2) = (Ai(x Pi o), A2(sGp,o)) mi the definition of L{x P) q) 

Hence, based on the above limit, a confidence region with level a for x p is 

jt a = {x p :L(x p ) < u a }, 

where u a is the a- level critical point of x 2 (l)- This confidence region has 
asymptotically correct coverage probability a, that is, P(x p S i£) ~~ > a as 
n — ► oo. 



Remark 3. In order to compare these three methods theoretically, it is 
necessary to derive corresponding coverage expansions for P a and P a . This 
requires much work and it will be one of our future topics. 



3. Simulation study and real application. 



3.1. A simulation study. In order to compare the performance of con- 
fidence regions based on the normal approximation method, the likelihood 
ratio method and the data tilting method, we conducted a simulation study 
to examine coverage probabilities and approximate lengths of confidence 
regions. 

We employed the following two distributions: (i) the Burr(o,/3) distribu- 
tion, given by F(x) = 1 - (1 + xP-<*)-a/(p-a) ( x > ) ; (ii) the Frechet(a) 
distribution, given by F(x) = exp(— x~ a ) {x > 0). Corresponding to (5), 
we have 7 = 1/ct, p = and A(t) = for Burr(a,/3), and 

7 = 1/a, p = —1 and Ait) = (2at)~ 1 for Frechet(a). 

First, we generated 10,000 random samples of size n = 1000 from the 
distributions Burr(l,1.5), Burr(l,2), Burr(2,3), Burr(2,4), Frechet(l) and 
Frechet(2), and then computed coverage probabilities of Iq 9 , Iq 9 and /q.9 f° r 
p n = 0.01 and p n = 0.001. These coverage probabilities are plotted against 
different sample fractions k = 20, 25, . . . , 300 in Figures 1-4. From these fig- 
ures we observe that the data tilting method is better than the other two 
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Bun( 1.1 .Sfwith p n=0.m 




Fig. 1. Coverage probabilities for Burr distributions with p n = 0.01. T/ie coverage prob- 
abilities of confidence regions Io.90> ^0.90 an d I0.90 are plotted against the different sample 
fractions k = 20, 25, . . . , 300 for different Burr distributions. 



methods in terms of coverage accuracy in general, especially for larger values 
of k. Thus, the data tilting method is less sensitive to the bias when a large 
value of k is employed. This may be due to the automatic choice of weights 
qi in the data tilting method. Although it does not make much sense to com- 
pare these three methods with the empirical likelihood method for quantiles 
(see Section 3.6 of [19]), we find that the coverage probabilities based on 
the empirical likelihood method for quantiles are 0.7631 for p n = 0.01 and 
0.6047 for p n = 0.001. These coverage probabilities are not as accurate as 
those based on the other three methods for most sample fractions k. Note 
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50 100 150 MO 250 300 



Fig. 2. Coverage probabilities for Burr distributions with p n — 0.001. The coverage prob- 
abilities of confidence regions Iq.sq, ^0.90 an ^ -^0.90 are plotted against the different sample 
fractions k = 20, 25, . . . , 300 for different Burr distributions. 



that the empirical likelihood method for quantiles is independent of the 
underlying distribution function. 

Second, we generated 1,000 random samples of size n = 1000 from the 
distributions Burr(l,2) and Prechet(l), and then calculated the length of 
Iq 9 and the approximate lengths of P Q 9 and ig.g f° r Pn = 0.01. Let us explain 
how we calculate the approximate length of Iq 9 . The same algorithm was 
employed to obtain the approximate length of P 09 . First, we search an x p 
near x p such that L(x p ) < Uo.g. Then we both increase and decrease x p by a 
small step 0.1 until L(x p ) > Uo.g. The corresponding values are denoted by 
x p and x l p, respectively. Thus, we approximate I 9 9 by the interval [xj,,Xp]. 
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Ff flchel (II with p n=0.01 



Fractal (3) MM P n=O.OI 
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Fig. 3. Coverage probabilities for Frechet distributions with p„ = 0.01. The coverage 
probabilities of confidence regions /Jgo; ^0.90 an ^ ^0.90 are plotted against the different 
sample fractions k = 20, 25, . . . , 300 for different Frechet distributions. 



These approximate lengths, plotted against the different sample 

fractions k = 20, 30, . . . , 300 in Figure 5. We notice that the approximate 
confidence interval lengths based on the data tilting method are smallest for 
most cases. 

Third, we generated a random sample of size n = 1000 from the distri- 
butions Burr(l,2) and Frechet(l), and then computed the data tilting like- 



FrKhil (I) will) p rt4.M1 



Hecho! is; with p n=D.ua I 




Fig. 4. Coverage probabilities for Frechet distributions with p n =0.001. The coverage 
probabilities of confidence regions Iq,qo, /q.90 an d I0.90 are plotted against the different 
sample fractions k = 20, 25, . . . , 300 for different Frechet distributions. 
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Burrf I Ji) with p n=O.U 1 Ff eclwtl I ) wllh p m=0. 1 




50 too iso aoo aw soo » «n tie hi sso mo 

k 



Fig. 5. Averages of the approximate confidence lengths with p n = 0.01. 27ie averages of 
approximate lengths of confidence regions I0.90, /o.90 an ^ -^0.90 are plotted against the dif- 
ferent sample fractions k = 20, 30, . . . , 300 for Burr(1.0, 2.0) and Frechet(l) distributions. 

lihood function for p n = 0.01 and x p = x Pt o — 50 + i, i = 0,1,..., 200, 

where x P: q denotes the true quantile. We took k = 50 and 100. Figure 6 
indicates that the data tilting likelihood function is approximately convex, 
which suggests that Iq 9 may indeed be an interval. Unlike the empirical 
likelihood method for means, we were unable to prove that P a is an interval. 



Data Tilting Likelihood Function for Burr(1 ,2) Data Tilling Ukallheod Function lor Frachat I IS 




SO 100 ISO 200 250 50 100 150 200 250 

Quamile « p Ouantie i _p 



Fig. 6. Data tilting likelihood function L(x p ) with p n = 0.01. The data tilting likelihood 
functions are plotted against different x p = x p .o — 50 + i, i = 0,1, ... , 200, for Burr(1.0, 2.0) 
and FrechetiX) , where x v $ denotes the true quantile. We took k — 50 and k = 100. 
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Fig. 7. Danish fire loss data. This consists of 2156 losses over one million Danish Kroner 
(DKK) from the years 1980 to 1990, inclusive. 



In summary, our simulation study for sample size n = 1000 prefers the 
data tilting method, which gives the best coverage accuracy in general, is less 
sensitive to the choice of sample fraction k, and has a shorter approximate 
interval length in most cases. Although we do not report the simulation 
study for sample size n = 200, the same conclusions as above are drawn, 
except that Method II performs worst. 



3.2. A real application. The data set we shall analyze consists of 2156 
Danish fire losses over one million Danish Kroner (DKK) from the years 
1980 to 1990 inclusive (see Figure 7). The loss figure is a total loss for the 
event concerned and includes damage to buildings, damage to furnishings 
and personal property, as well as loss of profits. This data set was analyzed 
by McNeil [16] and Resnick [22], where the right tail index was confirmed 
to be between 1 and 2. Further, Peng [20] applied the empirical likelihood 
method to this data set to obtain a confidence interval for the mean. 

We took p n = 0.001 and plotted the confidence interval Iq$q, and the 
approximate confidence intervals ^o.90 and -^0.90 > against the different sample 
fraction k = 60, 65, . . . , 400 in Figure 8. We note again that the approximate 
interval lengths based on the data tilting method are smallest for most cases. 

APPENDIX A: PROOFS OF THEOREMS 2, 3 AND 4 



PROOF of Theorem 2. Let V\, V2, . . . , V n be i.i.d. random variables uni- 
formly distributed over (0, 1) and V nj \ < V n ,2 < • ■ • < Vn,n be the order statis- 
tics of Vi,V 2 ,...,V n . Define c n = 1 - and d n = y/c n (l - c n )/(n + 1). 
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50 100 150 200 250 300 350 400 

K 

Fig. 8. Approximate confidence intervals for Danish fire loss data. The approximate 
confidence intervals with level 0.90 based on the normal approximation method (Method 
I), the likelihood ratio method (Method II) and the data tilting method (Method 111) are 
plotted against k = 60, 65, . . . , 400. We took p n — 0.001. 



Then the density function of {V n ,n-k — c n )/d n is 

<f>n(u) : 



(C n + d n u) n k l {l-C n -d n u) h 



k\(n-k-l)\ 

if < c n + d n u < 1, 
0, otherwise. 



For each —c n /d n < u < (1 — c n )/d n , define 
Y nJ (u) = 7 {logtf((l - ^ - d n u)- l {\ - Vi)~ l ) - logC/((l - Cn - d n uy 1 )}, 

Hn = Vk 1 {% l - 1 - 1 ), 
k 

H n {u) = ^=Y,{Y n ,{u)-l), 

nW " V niog(fc/(np n )) i0g [7(1/(1 -c n - d n u)) T 
Since 

Triv^ , X„ ^fn^/k X nn —k 



b-^p In V rv , -^-n,n—k , [T 
g — = i 7777 7T log hVfe 



\og(k/(np n )) x p log(k/(np n )) x p 

and 

\log(k/(np n )) x p ~ 
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J i 



P -y" 1 < log 

ln ~ l-x/Vklog(k/(np n )) & X n , n _ k 

P(H n < ——{x + y/k( z n Z rrlog 



.T 



V 



(j) n {u) du 



1 - x/Vk I Vog(k/(np n )) X n ^ n _ k 

for \x\ < A; 1 / 4 , it follows from Lemma 2.2 of [2] that 

riA\-o( ^n^k i £ p \ f°° ( . . x + r n (u)\ 

(14) P ( log( t /(np„)) "' g ^- I J =L P { H " [U) - T^iTt) 

for |ai| < A;" 1 / 4 . Similar to Lemma 2.3 of [2], we can prove that 
of it ( n ^ x + r n (n)\ 

(15) = *( + f J_{! _ f e+iM) 

\l-x/VkJ \l-x/^J 3Vk\ \l-x/VkJ 

- *(fr^) T^^*> + "(^ + ^"™) • 

uniformly for x £ R and |u| < A; 1 / 4 . Since (5) is equivalent to 

Hm \og(U(tx)) - log(U(t)) - log(g)/7 _ xP-l 
t^oo A(t) p ' 

it follows from Potter's bounds that 

2 

u u u 
r nW = 1 — 7777 7\ + TTTT 7 ; 7777 77 + 



log(k/(np n )) 2Vklog(k/(np n )) 3klog(k/(np n )) 
,, Vk\A{n/k)\ 1 



,log(fe/(np n )) ^/fclog^/fnpn))/' 
uniformly for |it| < A; 1 / 4 . Hence, 



- r n (u) \ 



u /—I ( k \ ~ 2 

(16) = <j){x)- j—r- rr + 4>{x)x 2 / Vk - -X(p(x)u 2 log 

log(k/(np n )) 2 \ np n J 

VV npj Vfc log(A,7(np n ))y v 

uniformly for |x| < A; 1 / 4 and \u\ < A; 1 / 4 . Thus, the theorem follows from 
(14)-(16) and the facts that ) \u\ l (j) n (u) du = 0(1) and j \u\ l (i) n (u)I(\u\ > 
fc 1 /*) = o(l/fc) for any * > 0. □ 
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Proof of Theorem 3. By the definition of 7(A), we have 
(17) 7 (A) 



1 - (\/k)%\og(x p /X n . n _ k ) ' 

Thus, equations (7) and (8) are equivalent to 

/ 1B x %log{x p /X n ^ k ) (n-X)p r 
( 18 ) 1 / \ /? \ " i — 7 7^ x + lo § ■ 



1 - (\/k)^ n \og(x p /X n ^ k ) k - A 

and 

(19) 1 - ^%\og{x p /X n ^ k ) > and A < k, 

respectively. Set 

,^ = ln\og{xp/X n ^_ k ) + (n - A)p n 
1 - {\/k)^ n \og{x p /X ri:n _ k ) k - A 

Then 5(A) is continuous and increasing in A under the restriction (19) since 

' f \\ = [7nlQg(Zp/^n,n-fc)] 2 n-k 

9{) k[l-(\/k)%log(x p /X n ^ k ))^ (n-\)(k-\) > ' 

If Xp/ ' X n ^ n _ k > 1, then (19) is equivalent to 

A < imn(k,k[%log(x p /X ntn - k )]~ 1 ) =: a n . 

Since g(—oo) = logp n < and g(a n — ) = 00, we conclude that there exists a 
unique A < a n such that g(X) = 0, that is, there exists a unique A satisfying 
(18) and (19). We can draw the same conclusion for the cases XpjX n ^ n _ k = 1 
and x p /X n>n ^. k < 1. So we prove the existence and uniqueness of a solution 
to (7) and (8). 
Since 



we have 



and 



logxp = --!- log ^ = flogp„ - log - - % log X n n -k ) , 

7n c n 7„ V n 

log(Xp/X n n - k ) = -^-(logp n - log = ^- log 
7n V nj 7 n 



7„ log(x p /X nin ,_ fc ) = log — % log ^. 

np n x p 

It follows from (6) that ^-1-^0. Thus, 

(20) 7n \og{x p /X n ^ n _ k ) = (log —) (1 + Op(l)) ^ 00. 
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Denote the solution to (7) and (8) by X n . Thus, g(X n ) = 0. First we will 
show that 

(21) p(\K\ < r - i , k IY ^l^i- 

This is equivalent to proving that, as n — > 00, 

(22) p( 5 (6 n )>0)^l and P(g(-b n ) < 0) -+ 1, 

where b n = k/[% log(x p /X n>n - k )] 2 . 
By (20) and Taylor's expansion, 

g(K) = %\og{x p / x ntn _ k )(i + ^%\og(x p / 'x n>n _fc)(i + op(i)) 

- log — + log(l - — ") - log(l - ^ 
= %log(x p /X n!n _ k ) - log-^— + ^(1 + o p (l)) 

+ y[7nlog(V^n,n-fc)] 2 (l + O p (l)) 

= -7nlog^ + yfTniog^p/^n.n-fc)] 2 ! 1 + °p( 1 )) 
= l+O p (l). 

Similarly, g(—b n ) = — 1 + o p (l). This yields (22) and, hence, (21). 
Since (20) and (21) imply 



X n „ 



(23) ^7 n log(a;p/X n) „_ fe ) 0, 

using Taylor's expansion again, we have 

x X 

0=g(X n ) = ~%log^- + -^[%log(x p /X ntn _ k )] 2 (l + O p (l)), 

that is, 

(24) ^~ ^lQg(x p /x p ) 7 nlog(Vx p ) 

(24) * "[7nl0g(V^n,n- fc )] 2( ^ j) "(log(fe/(np n ))) 2( + P( 

Note that 

logL(7,c) = fclog(c 7 ) - (7 + l)^logX n 

,n— i+1 

i=l 

+ (n-/c)log(l- C X n 7_ fc ) 
= fclog( 7 ) - (7 + l)*^" 1 + fclog(cX^_ fc ) - fclog(X n , n _ fe ) 
+ ( n -fc)log(l-cX-^_ fc ) 
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and 

l(x p ) = -2(logL(7(A n ),c(A n )) - logL(7„,c n )) 

= -2A/log^2 - (iM-l))- 2Hog(l - ^ + 2nlo, 



7n V In J J \ k 

In view of (17) and (23), we have 

_ 1 ^ 

% 1 - (\ n /k)% log(x p / X n)n ^ k ) 

It follows from (6) and (24) that 

(25) X n /Vk^0. 

Hence, by (20), (6), (25) and Taylor's expansion, 

2 



1. 



l(x p ) = fc(2^-l) (l + o p (l)) + O p (X 2 Jk) 

-(l + o p (l))+ 0p (l) 



7n 

(X n %log(x p /X nin _ k )) 2 



k 

(l +0p (l)) + 0p (l) 



%y/klog(xp/x p ) x 



J n log(Xp/X nin _ k ) 

\l + Op{l)) + Op{\) 



%Vklog(xp)x p 
\og(k/(np n )) 



-/(I)- 

Proof of Theorem 4. Let Zi = log(X n , n - i+1 /X ntn _ k ) for 1 
qn\ = ^ exp{ — 1 — Ai} for k + 1 < i < n and 

1 f i \ i \ f l °g( x p/ X n,n-k) 1 



= iexp|-l - Ai + A 



A 2 (Ai) Ai(Ai) 
Ai (Xi)Zi \og{xp/X ntn _ k ) 



^i(Ai) 

for 1 < i < k. Then (11) is equivalent to 

k k 
^2 <l(i) = A i{X\) and J2q {i) Zi = A 2 (X 1 ). 
i=l i=l 

Furthermore, this is equivalent to 

(26) 2^Q®=M*i) and ^fc = T7in 



i=l 
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The second identity in (26) is 



(27) 



^ =1 exp{-\ 2 A 1 (\ 1 )Z i log(x p /X n , n _ k )/Al(\ 1 )}Z i _ log(x p /X n>n - k ) 
Etiexp{-A2Ai(Ai)Z i log(sp/X n , n _ fc )/^(Ai)} log(Ai(Ai)/p„) ' 

In order to demonstrate the existence of a solution to equation (11), first 
we will show that, with probability tending to one, there exists a continuous 
function A 2 = A 2 (Ai) such that, for each Ai, (Ai,A2) = (Ai,A2(Ai)) is the 
solution to (27), and then we should prove that, for some Ai, (Ai,A2) = 
(Ai, A 2 (Ai)) is also the solution to the first identity of (26), and the solution 
satisfies both (12) and (13). To this end, set 



/(A) 



E- = iexp{-AZ t }Z, 
Eiiexp{-AZ 4 } " 

Then it is easy to see that lim^-oo /(A) = Z\, lim^oo /(A) = Z k and /(A) 
is decreasing in A by checking that ^log/(A) < 0. Therefore, there exists a 
unique continuous function r{x) such that f{r(x)) = x for any x G (Z k , Z\). 
From now on we restrict Ai such that (12) holds, that is, 



(28) 



1 



^logjk / (np n )) 



<Ai(Ai)<-f 
n \ 



1 + 



v /log(A;/(np w )) 
Vk 



which implies 

(29) 

and 



log(Ai(Ai)/p n 



< 



\og{k/{np n )) 

\og{x p /X n ^ n _ k ) 
\og{k/{np n )) V Vk^log(k/(np n )) 



(30) 



Set *Fyi 

(31) 



\og(x p /X n , n ^ k ) 
log(Ai(Ai)/p n ) 
\og{x p /X ntn _ k ) 
\og{k/(np n )) 

\og(x p /X n ^ n _ k ) 



< 



< 



1 + 



1 



{Zk < Ig^f < Zi}. Then P(T n 
log(x p /X n>n - k ) 



Vk^/log(k/(np n )) 
1 since 



p -l 
-*"7 , 



\og(k/{np n )) 
By definition, f(r K log[Ai{Xi)/pn) 

'\og{x p /X n , n _ k ) 



oo, 



Z k ^0. 



(32) 



A 2 = A 2 (Ai) =r 



Itx!^) on This im p lies that 



log(Ai(Ai)/p n )7 A 1 (\ 1 )log(x p /X n>n _ k ) 
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is the unique solution to equation (27), with probability tending to one. 
Set 

= \og{x p /X n ^ n _ k ) \og(x p /X n ^ k ) - f \og(x p /X ntn - k ) 

1 " log^iCAi)/^) \og{k/{np n )) an T 'Ug(^ 1 (A 1 )/p n ) 
It follows from (30) that 

(33) i?i = O p (^ 1 /2 (log(A , /(npn))) "i/2 ) 

holds uniformly for Ai under the restriction (12) or, equivalently, (28). Here- 
after all terms O p {-) and o p (-) are assumed to hold uniformly for Ai if Ai is 
involved. 

Using (6), (33) and 

log(x p /X n 

,n—k) - -1 

_ \og(x p /X n 

,n—k) ~— l D 

log^AOM,) ~ ln hg(k/(n Pn )) ~ ln + 1 

( 34 ) 

1 Xp 

\og(k/(np n )) ° g x p 

we have 

m) \og{ Xp /X n ^ k ) _! _ 1/2 

(35) ^(^(AO/P.) 7 " } - 

On the other hand, from Taylor's expansion, we have /(ifc^ 1 / 4 ) — /(0) = 
±fc- 1 / 4 /'(0)(l + o p (l)), where /(0) = 7" 1 , and 

(36) /'(0) = -^E^-7n 2 ) =-7" 2 (l + P (^ 1/2 ))- 

For the proof for the last step of (36), see, for example, [4] or [9]. Hence, 

P V {k } < ^tMMjJpV) ) 

and 



V [ )> log(A 1 (X 1 )/p n )J 



Therefore, A = r( ^^fj^ ) satisfies P(A G (-fc" 1 / 4 , fc" 1 / 4 )) -► 1. Thus, 
from Taylor's expansion, we obtain 

/(A) - 7" 1 = A/'(0)(1 + O.ik- 1 /*)) = - 7 " 2 A(1 + P (^ 1/4 )), 

which, coupled with (34), yields 



A = —7 

(37) 



2 f lQg(W*n,n-fc) _ .-A fl Q ( , -1/4^ 
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Then, by using (35) and Taylor's expansion, we have 

k 

(38) ^exp{-AZ l } = A;(l + O p (A;- 1 / 2 )). 

i=i 



Note that 



A2 — A2(Ai) 



A^1(A : 



Ai(Ai)log(xp/X ri) „_ fe ) : 



where A is a function of Ai as well. Plug A2 = A2(Ai) into and set h(X±] 
J2i=i Q(i) ~ -^i(Ai). Then /i(Ai) has the expression 



— exp{-l - Ai}exp(— — — — 
n { Ai(\i)log{xp/X n , n -k) 

{ log(x p /X n>n _k) 

Put 



X n ^n— k) 



V tl k J 

VkyJ\og(k/(np n ))' 



and 



Ai' = -log 1 

V Tl — K 

It is easy to check that 

\/k^\og(k / (np n )) 



-l-Ai = j 



+ 



n—k 



(39) A 1 (Aj) = ^(l-^ s( ^ Wflt)) 



wiM ^-l^i Vlog(fc/(np n )) / y/log(/c/(np n )) 

MXi)=7 n{ 1 — Tk — + n — Tk — 

The first two identities are obvious. The third one follows from the second 
one, equation (35) and a well-known result for the Hill estimator, that is, 

v^^-T- 1 )^^- 2 ). 

Now it follows from (39) and (38) that 



ft(A;) jy!^ (1+0p(1)) , 
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Similarly, 

MA , /) = JvMfe)l (1 + 0p(1)) . 

Hence, with probability tending to one, there exists a Ai satisfying (12) and 
the first equation in (26) with A2 = A2(Ai) defined in (32); that is, we have 
shown the existence of the solution to (11) such that (12) and (13) hold. 

We still need to estimate Ai from the equation /i(Ai) = 0, which is equiv- 
alent to 

lexp(Af l0g(Xp/X "'- fc) - lQ g(V*»^) )|f eM-m 

(40) 

~ n — k 
= expjl + Ai} . 

n 

It follows from (35) that 

\og(x p /X n<n _ k ) \og(x p /X n 

,n—k) _ — 1 /-, \ 

log^Ai)/^) _ (log(^l(A 1 )/p n ))2 ~ ln 0p{h 

Hence, applying Taylor's expansion to both sides of (40) yields 

V n 

It is easy to show that maxi<j< n \nq{ — 1| = o p (l). Thus, 



(41) l + Ai = O p | 



L(x p ) = 2^2nqi{nqi - 1 - \(nq,i - 1) 2 (1 + o p (l))} 
i=l 

n 

= Y J {nq l -lf{l + o p {l)) 

i=l 

n 

= ^(log(n % )) 2 (l + o p (l)) 
i=l 

= (1 + o p (l)) ((n - k)(l + Ax) 2 + X>g(ng (i) )) 2 ) . 
It follows from (29) and (35) that 

W UgCAiCAO/pn) (log(Ai(Ai)/p n )) 2 

= -(l + Ai) 



A(^-7,- 1 + P (^ l0S( ^ )) + 



log(/e/ (np„)) 
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uniformly for i = 1 



. . . , k. As in (36), we have 




n 



-2 



) 



hy-^l + Orik- 1 / 2 )). 



Hence, by (6), (37) and (41), 



L(x p ) = (l + o p (l)) n(l + Ai) 2 + A 2 



i=l 





□ 
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